.br
.ul
Data Augmentation:
.pg
Data augmentation is simply
the addition of information
to basic source data.
There are two basic types,
.ul
static
and 
.ul
dynamic.
The static case involves
the addition of constants,
fixed values, which are
known and specified by 
the project specialists
to the specialists
or experts in the
.ul
computation center,
or assignment of values
to variables
which are derived from the unit
source record or from subsets
of unit source records.
The dynamic case involve
the creation of values for
variables based on the values of
variables in the data set, such as calculating
day of week from day of month.
Where time is a major analytic variable,
we must structure our data file,
and develop the computer
algorithms to derive and
augment our unit record
data file with the appropriate
values of specified variables
for each unit of specified time.
.m4 1
The optimum strategy for
data augmentation for any
given project is closely
related to the file
management practices
available to and
supported by the computing system.
.br
.ul
Validation:
.pg
When handling large analytical
problems or large volumes
of data it is almost always
.m4 3
undesirable or impossible
to check by hand.
Human validation is not effective in
controlling the course of the analytic
processing by a computer.
Of course, in the final stages of
evaluation of the results
by an inference or decision
making group, human validation
and judgement have the greatest
importance.
In the analytical phase of the computerized
project,
validation is of the same level of
importance as the
calculation of the results
themselves.
It is interesting to
mention that there is a mnemonic
GIGO in wide use among computation experts.
It stands for GARBAGE IN-GARBAGE
OUT.
If you do a very fine analytic
program but do not
guard against wrong data
being used or if analytical
validity checks are not
inserted at the critical
points of the analysis
the results may be misleading,
impossible, or may be wrong
only in part.
But even as you suspect error,
you will not know which of these
situations has occurred.
.pg
Further, analysis tends
to be done sequentially and
validation points can often
be placed at the break
points of the analysis.
Sound practice suggests that all
critical values should be
screened for validity.
Where no known screen exists,
these critical values should be
included as a part of the results.
.pg
In large model simulation programs
there are often introduced
values of variables both
endogenous and exogenous
to describe the economic,
or engineering, or scientific
situation or population which
is to be studied.
These parameters of the model
should always be brought out
as the leading pages of the
report|-|they are an important part
of the study.
They represent a numerical
interface between the
customer group and the
computation group.
If the project requires
periodic analysis using
time dependent data,
then generally a part of the
analytic results of great importance
to validation is the incremental
changes from one time period
to the next.
When this is the case,
the customer must provide the
validating group with the
means to examine these increments
for evidence of input error.
It may also be possible to
specify constraints of various
classes of increments to select
those most likely to indicate error.
In many complex models, several
runs with "special case" parameters,
for which analytic results are
known, are made as an early
validity trial.
The development of a
set of validation procedures demands
the closest cooperation, between the
technical groups of the client
and the computation experts
of the center.
There is a temptation with
inexperienced systems
analysts to regard validation
and documentation as
nonproductive effort.
Whereas the wise will regard
these efforts to be of the
same importance as
producing the final answers.
.br
.ul
Results Report Generation:
.pg
Almost all projects terminate
the computer-computation
phase with report generation.
This consists of first producing
printed information carefully
structured to be as understandable
and as readable as possible.
It may well have an overall
summary section with condensed
information for top management,
a department summary for department
chiefs, and a detail section for
working experts who must evaluate result
details and a plan for
subsequent trials or analyses.
Example:||Consider the problem
of the model of the Egypt-Italy
Submarine Cable.
The Overall Summary
could show routes, location
of repeaters with type
of cable|-|(armored,
armorless)|-|switching
models (if any), installation
costs, manufacturing costs, manpower
requirements, cutover time,
traffic estimate by time
of day, by year together with
associated revenue for the first
five years.
For the traffic
department there would be a
detailed analysis of traffic
by incoming-versus-outgoing
traffic by telephone, radio
and high and low speed telgraph;
by holding time,
by time of day, by month of year
together with projections
of outgoing busy hour, incoming
busy hour, joint busy hour
so that the overload conditions
can be forecast.
The engineering department responsible
for maintenance would get an analysis
which lists land repeater stations
and mechanism, together with
their life estimates, replacement
costs, etc.
The detail analysis will give
the length of each section
of coaxial cable, the exact
requirement and location of
the connecting repeater and all
other pertinent details.
.br
.ul
Retention period:
.pg
.m4 5
This is the length of
time that information must
be held.
It is critical with respect
to both input data files
and report results or
output data.
Long retention periods
affects both storage
media and cost.
.br
.ul
Summary Reports:
.pg
Which are reports where a substantial
part of the input data is the set
of detail reports.
Example:||An Annual Report
.m4 3
compiled from twelve monthly reports.
.br
.ul
Distribution Lists:
.pg
If the report results
are to be
multi-copy and to have
different sections for different
departments then distribution
lists must be made by the client
together with the appointment o
one of its members with authority
.m4 1
to release the report to
the personnel on the
distribution list.
In all this activity the
Computation Center acts as
the agent and assumes no authority
to release.
.br
.ul
Validation of Reports:
.pg
Reports, in general, are not
published or released without
prior approval by an
authorized manager.
With complex computer-oriented
projects this often requires
a special summary report sheet
with those values of specified
project factors which this
manager can use for his evaluation.
.pg
The technical subcommittee
.m4 3
for the project
may request an ancilliary report
to aid it in evaluation of
the detail analysis.
Example:||It may request an
analysis of individuals (detail),
and that for each of specified factors
the computer list the individuals
(together with the complete analysis on
that set of individuals)
with the Maximum, or Minimum value
of each factor.
.br
.ul
Confidentiality:
.pg
Data and related analytic models are
the property and
responsibility of an agency.
Utilization or disclosure
by any other agency without permission
is a violation of confidentiality.
The following terms
.ul
security
and 
.ul
privacy
are distinct elements of this
concept, while integrity is
closely related.
.br
.ul
Privacy:
.pg
The right and responsibility of
the agency in charge of
information to protect from intrusion
the information it controls.
Data may be temporarily the
property of an individual
and until he releases it by
publication to his
agency, or the public,
no one outside his chain of command
has the authority to seize
it, or change its status.
.br
Example:||A research geologist
has data on the physical
features of a mineral-rich
region and is using the computer
to map and lay out proposed
access routes.
A computer technologist would like
to use the data in a paper he
is writing on network analysis,
and proposes to borrow
the data without permission.
This would violate the privacy
of this research data and
is not permitted in a computation center.
.pg
There are legal aspects to
the problems of responsibility
for providing privacy,
of invasion of privacy, and
of unauthorized disclosure.
Each individual juridical
domain has its own set
.m4 5
of pertaining laws and the
computer center's procedures must
be in compliance.
.br
.ul
Security:
.pg
It is the responsibility of the
Computation Center to guard
against any intrusion on privacy,
and to protect the physical forms
of the information (documents, punched
cards, magnetic tape, magnetic disc)
from loss or damage due to innocent, malicious,
.m4 3
or careless actions
of persons.
As discussed in Chapter|2,
page 19 and in Chapter|4,
page 83-87, the center
is responsible for
maintaining its plant security
to minimize the risk to
the property of its clients.
Both clerical and computer
systems are developed to
meet this responsibility for security.
A computation center never
.m4 1
publicly discloses its
security program.
It will disclose, for
approval, its proposed
security plans for a given
project to the agency leader.
Very tight security on input and
output is expensive and adequate
security is always the goal.
All computation center personnel
having access to sensitive
input or output data are trained
and cautioned about security
violations, just as in a bank,
or medical records section
of a hospital.
.br
.ul
Integrity:
.pg
Physical integrity of data is the property
of maintaining
unchanged in
.ul
value 
all the information of the
project within the
responsibility of the computation center.
The center guarantees not to lose
or damage the data punch card
decks, the data tapes, the
computation algorithms and
programs, and the analytic results and
reports, and takes such actions and
steps to meet that guarantee.
How to guarantee that the
.m4 3
information delivered by
the client, or produced by
computer analysis is maintained
intact, and without a single
change over the
retention period is a difficult
and serious problem.
The computations system of storage
and clerical operations must
be designed for a goal of no
loss of integrity.
When storing data on magnetic
devices, the possible loss of
information on the surface of
the magnetic device due to contamination,
wear, or oxidation must be considered
and an acceptable rate must
be approved by the client and
an appropriate redundancy level used.
The customer has prime
responsibility for the physical
integrity of his project.
He must take action determined
by the characteristics of the
media containing his information,
the competency of the center to
meet its responsibilities, and the
relevant costs associated with
each possible action.
.pg
The computation is responsible
for informing all its clients
about the binary representation of
numbers as it affects the
significance of digits
in both logical and arithmetic
operations since the
.ul
computational integrity
can be compromised.
.pg
For each language supported by
the center, information must be
.ul
given
to the customer about the
internal representation of a
field as it affects this
computational integrity.
In particular, left and
right adjust, single and
double word characteristics,
single and double precision input,
fixed and floating point
numbers must be explicitly defined.
.br
.ul
Redundancy Level:
.pg
Redundancy level is the
number of copies of a
data file required to guarantee
a given physical integrity level.
Example:||Suppose that for a
two year retention period a given
type of magnetic tape has a quality
level of 0.90, that is,
there is for each tape a 10%
chance that some portion of it
will not be readable.
Then if we use a Redundancy level
of two, that is have two copies, then
there will be a (.10) (.10)|=|.01 or 1%
chance that both will be unreadable.
.nx mt58.m.1
